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Abstract 

This paper proposes a method to analyze 
Japanese anaphora, in which zero pronouns 
(omitted obligatory cases) are used to refer to 
^receding entities (antecedents). Unlike the 



For identifying anaphoric relations, existing 
methods are classified into two fundamental ap- 
proaches: rule-based and statistical approaches. 

In rule-based approaches flGrosz et al., 1995 ; 
Hobbs, 1978|; [Mitkov et al., 1998|; iNakaiwa 



nouns have to be detected prior to resolution 
because they are not expressed in discourse. 
Our method integrates two probability param- 
eters to perform zero pronoun detection and 
resolution in a single framework. The first pa- 
rameter quantifies the degree to which a given 
case is a zero pronoun. The second parame- 
ter quantifies the degree to which a given entity 
is the antecedent for a detected zero pronoun. 
To compute these parameters efficiently, we use 
corpora with/without annotations of anaphoric 
relations. We show the effectiveness of our 



preceding entities (antecedents). Unlike the (lxuuuo, uiq, pumuv ci clu, u^q, pcmcuwa 
Ease nf gpnpral rnrpfprpnrp rpsnlntinn 7 P rn prn- and Shirai, 1996]; pkumura and Tamura, 1996| ; 



Palomar et al., 2001| ; [Walker et al., 1994[) 
anaphoric relations between anaphors and their 
antecedents are identified by way of hand- 
crafted rules, which typically rely on syntactic 
structures, gender/number agreement, and se- 
lectional restrictions. However, it is difficult to 
produce rules exhaustively, and rules that are 
developed for a specific language are not neces- 
sarily effective for other languages. For exam- 
ple, gender/number agreement in English can- 
not be applied to Japanese. 

Statistical approaches ( [Aone and Bennett 



method bv wav of experiments. 



1995 



1995 



1 Introduction 

Anaphora resolution is crucial in natural lan- 
guage processing (NLP), specifically, discourse 
analysis. In the case of English, partially mo- 
tivated by Message Understanding Conferences 
(MUCs) ( Grishman and Sundheim, 1996| ), a 
number of coreference resolution methods have 
been proposed. 

In other languages such as Japanese and 
Spanish, anaphoric expressions are often omit- 
ted. Ellipses related to obligatory cases are usu- 
ally termed zero pronouns. Since zero pronouns 
are not expressed in discourse, they have to be 
detected prior to identifying their antecedents. 
Thus, although in English pleonastic pronouns 
have to be determined whether or not they are 
anaphoric expressions prior to resolution, the 
process of analyzing Japanese zero pronouns is 
different from general coreference resolution in 
English. 



Ge et al., 1998|; |Kim and Ehara, 



[Soon" et al., 2001] ) use statistical mod- 
els produced based on corpora annotated with 
anaphoric relations. However, only a few 
attempts have been made in corpus-based 
anaphora resolution for Japanese zero pro- 
nouns. One of the reasons is that it is costly 
to produce a sufficient volume of training cor- 
pora annotated with anaphoric relations. 

In addition, those above methods focused 
mainly on identifying antecedents, and few at- 
tempts have been made to detect zero pronouns. 

Motivated by the above background, we 
propose a probabilistic model for analyzing 
Japanese zero pronouns combined with a detec- 
tion method. In brief, our model consists of two 
parameters associated with zero pronoun detec- 
tion and antecedent identification. We focus on 
zero pronouns whose antecedents exist in pre- 
ceding sentences to zero pronouns because they 
are major referential expressions in Japanese. 
Section ^ explains our proposed method (sys- 



tern) for analyzing Japanese zero pronouns. 
Section |3] evaluates our method by way of ex- 
periments using newspaper articles. Section [| 
discusses related research literature. 

2 A System for Analyzing Japanese 
Zero Pronouns 

2.1 Overview 

Figure [I] depicts the overall design of our system 
to analyze Japanese zero pronouns. We explain 
the entire process based on this figure. 

First, given an input Japanese text, our sys- 
tem performs morphological and syntactic anal- 
yses. In the case of Japanese, morphological 
analysis involves word segmentation and part- 
of-speech tagging because Japanese sentences 
lack lexical segmentation, for which we use 



(Input text) 



the JUMAN morphological analyzer ( Kurohashi 
and Nagao, 1998b[). Then, we use the KNP 



parser ( Kurohashi, 199§| ) to identify syntactic 
relations between segmented words. 

Second, in a zero pronoun detection phase, 
the system uses syntactic relations to detect 
omitted cases (nominative, accusative, and da- 
tive) as zero pronoun candidates. To avoid zero 
pronouns overdetected, we use the IPAL verb 
dictionary (|Information-technology Promotion 



Agency, 19871 ) including case frames associated 
with 911 Japanese verbs. We discard zero pro- 
noun candidates unlisted in the case frames as- 
sociated with a verb in question. 

For verbs unlisted in the IPAL dictionary, 
only nominative cases are regarded as obliga- 
tory. The system also computes a probability 
that case c related to target verb v is a zero 
pronoun, P zero (c\v), to select plausible zero pro- 
noun candidates. 

Ideally, in the case where a verb in ques- 
tion is polysemous, word sense disambiguation 
is needed to select the appropriate case frame, 
because different verb senses often correspond 
to different case frames. However, we currently 
merge multiple case frames for a verb into a sin- 
gle frame so as to avoid the polysemous prob- 
lem. This issue needs to be further explored. 

Third, in a zero pronoun resolution (i.e., an- 
tecedent identification) phase, for each zero pro- 
noun the system extracts antecedent candidates 
from the preceding contexts, which are ordered 
according to the extent to which they can be the 
antecedent for the target zero pronoun. From 
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Figure 1: The overall design of our system to 
analyze Japanese zero pronouns. 



the viewpoint of probability theory, our task 
here is to compute a probability that zero pro- 
noun 4> refers to antecedent ai, P(ai\4>), and se- 
lect the candidate that maximizes the probabil- 
ity score. For the purpose of computing this 
score, we model zero pronouns and antecedents 



in Section 2.2 



Finally, the system outputs texts containing 
anaphoric relations. In addition, the number 
of zero pronouns analyzed by the system can 
optionally be controlled based on the certainty 



score described in Section 2.4 



2.2 Modeling Zero Pronouns and 
Antecedents 

According to past literature associated with 
zero pronoun resolution and our preliminary 
study, we use the following six features to model 
zero pronouns and antecedents. 

• Features for zero pronouns 

- Verbs that govern zero pronouns (v), which 
denote verbs whose cases are omitted. 

- Surface cases related to zero pronouns (c), 
for which possible values are Japanese case 
marker suffixes, ga (nominative), wo (ac- 
cusative), and ni (dative). Those values 
indicate which cases are omitted. 

• Features for antecedents 

- Post-positional particles (p), which play 
crucial roles in resolving Japanese zero pro- 
nouns QKameyama, 1986| ; |Walker et al. 



1991 ) 



Distance (d), which denotes the distance 
(proximity) between a zero pronoun and an 
antecedent candidate in an input text. In 
the case where they occur in the same sen- 
tence, its value takes 0. In the case where 
an antecedent occurs in n sentences previ- 
ous to the sentence including a zero pro- 
noun, its value takes n. 

Constraint related to relative clauses (r), 
which denotes whether an antecedent is in- 
cluded in a relative clause or not. In the 
case where it is included, the value of r 
takes true, otherwise false. The rationale 
behind this feature is that Japanese zero 
pronouns tend not to refer to noun phrases 
in relative clauses. 

Semantic classes (n), which represent se- 
mantic classes associated with antecedents. 
We use 544 semantic classes defined in the 
Japanese Bunruigoihyou thesaurus (|Na-| 



ability, P(a\4>), is expressed as in Equation (ph. 



tional Language Research Institute, 1964|) , 
which contains 55,443 Japanese nouns. 

2.3 Our Probabilistic Model for Zero 
Pronoun Detection and Resolution 

We consider probabilities that unsatisfied case 
c related to verb v is a zero pronoun, P zero (c\v ), 
and that zero pronoun <p c refers to antecedent 
cij, P(ai\4> c ). Thus, a probability that case c (0 C ) 
is zero-pronominalized and refers to candidate 
en is formalized as in Equation (111). 



P{ai\(t> c ) ■ P z , 



era 



(1) 



Here, P ze ro(c\v) and P(ai\4> c ) are computed in 
the detection and resolution phases, respec- 
tively (see Figure [j]) . 

Since zero pronouns are omitted obligatory 
cases, whether or not case c is a zero pronoun 
depends on the extent to which case c is oblig- 
atory for verb v. Case c is likely to be oblig- 
atory for verb v if c frequently co-occurs with 
v. Thus, we compute P zer o(c\v) based on the 
co-occurrence frequency of {v,c) pairs, which 
can be extracted from unannotated corpora. 
Pzero{c\v) takes 1 in the case where c is ga (nom- 
inative) regardless of the target verb, because ga 
is obligatory for most Japanese verbs. 

Given the formal representation for zero pro- 



P(Oi\ 



P(j>i,di; 



\v,c) 



(2) 



To improve the efficiency of probability estima- 
tion, we decompose the right-hand side of Equa- 
tion (||) as follows. 

Since a preliminary study showed that d( and 
rj were relatively independent of the other fea- 
tures, we approximate Equation (§) as in Equa- 
tion (|). 

P( ai \<t>) « P{ Pi ,n i \v,c)-P(d i )-P(r i ) 

= P(pi\m,v,c) ■ P(rii\v,c) (3) 
• P(di) ■ P{n) 

Given that pi is independent of v and rtj, we 
can further approximate Equation (]3[) to derive 
Equation @. 

P(a# c ) « Pipil^-Pid^-Pir^-Pimlv^) (4) 

Here, the first three factors, P(pi\c) ■ P{d,{) ■ 
P(ri), are related to syntactic properties, and 
P(iii\v, c) is a semantic property associated with 
zero pronouns and antecedents. We shall call 
the former and latter "syntactic" and "seman- 
tic" models, respectively. 

Each parameter in Equation (||) is com- 
puted as in Equations (||), where F(x) denotes 
the frequency of x in corpora annotated with 
anaphoric relations. 



P(pi\c) 
P(di) 

P(n) 
p(m\v,c) 



F(Pi,c) 
F(dj) 

F(rj) 

EjFirj) 

F(nj,v,c) 
Ej F {nj,v,c) 



(5) 



nouns and antecedents in Section 2.2, the prob 



However, since estimating a semantic model, 
P(rii\v,c), needs large-scale annotated corpora, 
the data sparseness problem is crucial. Thus, 
we explore the use of unannotated corpora. 

For P(rii\v,c), v and c are features for a zero 
pronoun, and is a feature for an antecedent. 
However, we can regard v, c, and rij as features 
for a verb and its case noun because zero pro- 
nouns are omitted case nouns. Thus, it is pos- 
sible to estimate the probability based on co- 
occurrences of verbs and their case nouns, which 



can be extracted automatically from large-scale 
unannotated corpora. 

2.4 Computing Certainty Score 

Since zero pronoun analysis is not a stand-alone 
application, our system is used as a module in 
other NLP applications, such as machine trans- 
lation. In those applications, it is desirable that 
erroneous anaphoric relations are not generated. 
Thus, we propose a notion of certainty to out- 
put only zero pronouns that are detected and 
resolved with a high certainty score. 

We formalize the certainty score, C(4> c ), for 
each zero pronoun as in Equation (||), where 
Pi(4> c ) an d P%{(j>c) denote probabilities com- 
puted by Equation ([[]) for the first and second 
ranked candidates, respectively. In addition, t is 
a parametric constant, which is experimentally 
set to 0.5. 

C(<j) c ) = t.Py(<j> c ) + (l-t)(Pi(</» c )-P 2 (^ c )) (6) 

The certainty score becomes great in the case 
where P\((j) c ) is sufficiently great and signifi- 
cantly greater than i-^C^c)- 

3 Evaluation 
3.1 Methodology 

To investigate the performance of our system, 
we used Kyotodaigaku Text Corpus version 
2.0 ( Kurohashi and Nagao, 1998a ), in which 



20,000 articles in Mainichi Shimbun newspaper 
articles in 1995 were analyzed by JUMAN and 
KNP (i.e., the morph/syntax analyzers used in 
our system) and revised manually. From this 
corpus, we randomly selected 30 general articles 
(e.g., politics and sports) and manually anno- 
tated those articles with anaphoric relations for 
zero pronouns. The number of zero pronouns 
contained in those articles was 449. 

We used a leave-one-out cross-validation eval- 
uation method: we conducted 30 trials in each 
of which one article was used test input 
and the remaining 29 articles were used for pro- 
ducing a syntactic model. We used six years 
worth of Mainichi Shimbun newspaper arti- 
cles ([Mainichi Shimbunsha, 1994-199S( ) to pro- 
duce a semantic model based on co-occurrences 
of verbs and their case nouns. 

To extract verbs and their case noun pairs 
from newspaper articles, we performed a mor- 
phological analysis by JUMAN and extracted 



dependency relations using a relatively simple 
rule: we assumed that each noun modifies the 
verb of highest proximity. As a result, we 
obtained 12 million co-occurrences associated 
with 6,194 verb types. Then, we generalized 
the extracted nouns into semantic classes in 
the Japanese Bunruigoihyou thesaurus. In the 
case where a noun was associated with multiple 
classes, the noun was assigned to all possible 
classes. In the case where a noun was not listed 
in the thesaurus, the noun itself was regarded 
as a single semantic class. 

3.2 Comparative Experiments 

Fundamentally, our evaluation is two-fold: we 
evaluated only zero pronoun resolution (an- 
tecedent identification) and a combination of 
detection and resolution. In the former case, 
we assumed that all the zero pronouns are cor- 
rectly detected, and investigated the effective- 
ness of the resolution model, P(ai\4>). In the 
latter case, we investigated the effectiveness of 
the combined model, P(di\(f) c ) • P zero (c\v). 

First, we compared the performance of the 
following different models for zero pronoun res- 
olution, P(di\4>): 

• a semantic model produced based on anno- 
tated corpora (Semi), 

• a semantic model produced based on unan- 
notated corpora, using co-occurrences of 
verbs and their case nouns (Sem2), 

• a syntactic model (Syn), 

• a combination of Syn and Semi (Bothl), 

• a combination of Syn and Semi (Both2), 
which is our complete model for zero pro- 
noun resolution, 

• a rule-based model (Rule). 

As a control (baseline) model, we took approxi- 
mately two man-months to develop a rule-based 
model (Rule) through an analysis on ten articles 
in Kyotodaigaku Text Corpus. This model uses 
rules typically used in existing rule-based meth- 
ods: 1) post-positional particles that follow an- 
tecedent candidates, 2) proximity between zero 
pronouns and antecedent candidates, and 3) 
conjunctive particles. We did not use seman- 
tic properties in the rule-based method because 
they decreased the system accuracy in a prelim- 
inary study. 



Table 1: Experimental results for zero pronoun resolution. 









# of Correct 


cases (Accuracy) 
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Semi 


Sem2 


Syn 


Both! Bothl 


Rule 


1 

2 
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25 (6.2%) 
46 (11.4%) 
72 (17.8%) 


119 (29.5%) 
193 (47.8%) 
230 (56.9%) 


185 (45.8%) 
227 (56.2%) 
262 (64.9%) 


30 (7.4%) 205 (50.7%) 
49 (12.1%) 250 (61.9%) 
75 (18.6%) 280 (69.3%) 


162 (40.1%) 
213 (52.7%) 
237 (58.6%) 



Table || shows the results, where we regarded 
the fc-best antecedent candidates as the final 
output and compared results for different values 
of A;. In the case where the correct answer was 
included in the /c-best candidates, we judged it 
correct. In addition, "Accuracy" is the ratio be- 
tween the number of zero pronouns whose an- 
tecedents were correctly identified and the num- 
ber of zero pronouns correctly detected by the 
system (404 for all the models). Bold figures 
denote the highest performance for each value 
of k across different models. Here, the average 
number of antecedent candidates per zero pro- 
noun was 27 regardless of the model, and thus 
the accuracy was 3.7% in the case where the 
system randomly selected antecedents. 

Looking at the results for two different seman- 
tic models, Sem2 outperformed Semi, which 
indicates that the use of co-occurrences of verbs 
and their case nouns was effective to identify 
antecedents and avoid the data sparseness prob- 
lem in producing a semantic model. 

The syntactic model, Syn, outperformed the 
two semantic models independently, and there- 
fore the syntactic features used in our model 
were more effective than the semantic features 
to identify antecedents. When both syntactic 
and semantic models were used in Both2, the 
accuracy was further improved. While the rule- 
based method, Rule, achieved a relatively high 
accuracy, our complete model, Both2, outper- 
formed Rule irrespective of the value of k. To 
sum up, we conclude that both syntactic and 
semantic models were effective to identify ap- 
propriate anaphoric relations. 

At the same time, since our method requires 
annotated corpora, the relation between the 
corpus size and accuracy is crucial. Thus, we 
performed two additional experiments associ- 
ated with Both2. 

In the first experiment, we varied the number 
of annotated articles used to produce a syntactic 
model, where a semantic model was produced 
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Figure 2: The relation between the corpus size 
and accuracy for a combination of syntactic and 
semantic models (Both2). 



based on six years worth of newspaper articles. 
In the second experiment, we varied the num- 
ber of unannotated articles used to produce a 
semantic model, where a syntactic model was 
produced based on 29 annotated articles. In 
Figure ^, we show two independent results as 
space is limited: the dashed and solid graphs 
correspond to the results of the first and second 
experiments, respectively. Given all the articles 
for modeling, the resultant accuracy for each ex- 
periment was 50.7%, which corresponds to that 
for Both2 with k = 1 in Table |. 

In the case where the number of articles was 
varied in producing a syntactic model, the ac- 
curacy improved rapidly in the first five arti- 
cles. This indicates that a high accuracy can 
be obtained by a relatively small number of su- 
pervised articles. In the case where the amount 
of unannotated corpora was varied in produc- 
ing a semantic model, the accuracy marginally 
improved as the corpus size increases. However, 
note that we do not need human supervision to 
produce a semantic model. 

Finally, we evaluated the effectiveness of the 
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Figure 3: The relation between coverage and 
accuracy for zero pronoun detection (Both2). 
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Figure 4: The relation between coverage and 
accuracy for antecedent identification (Both2). 



combination of zero pronoun detection and res- 
olution in Equation (Q) . To investigate the con- 
tribution of the detection model, P zero (c\v), we 
used P(ai\4> c ) for comparison. Both cases used 
Both2 to compute the probability for zero pro- 
noun resolution. We varied a threshold for the 
certainty score to plot coverage-accuracy graphs 
for zero pronoun detection (Figure |3|) and an- 
tecedent identification (Figure ||) . 

In Figure ||[ "coverage" is the ratio between 
the number of zero pronouns correctly detected 
by the system and the total number of zero pro- 
nouns in input texts, and "accuracy" is the ratio 
between the number of zero pronouns correctly 
detected and the total number of zero pronouns 
detected by the system. Note that since our sys- 
tem failed to detect a number of zero pronouns, 



the coverage could not be 100%. 

Figure H| shows that as the coverage decreases, 
the accuracy improved irrespective of the model 
used. When compared with the case of P(ai\cj)), 
our model, P(ai\cj>)-P zero (c\v), achieved a higher 
accuracy regardless of the coverage. 

In Figure ^j, "coverage" is the ratio between 
the number of zero pronouns whose antecedents 
were generated and the number of zero pro- 
nouns correctly detected by the system. The 
accuracy was improved by decreasing the cov- 
erage, and our model marginally improved the 
accuracy for P(aj|0). 

According to those above results, our model 
was effective to improve the accuracy for zero 
pronoun detection and did not have side effect 
on the antecedent identification process. As a 
result, the overall accuracy of zero pronoun de- 
tection and resolution was improved. 

4 Related Work 



Kim and Ehara ( 1995| ) proposed a probabilis- 
tic model to resolve subjective zero pronouns 
for the purpose of Japanese/English machine 
translation. In their model, the search scope 
for possible antecedents was limited to the sen- 
tence containing zero pronouns. In contrast, 
our method can resolve zero pronouns in both 
intr a/inter-sentential anaphora types. 

Aone and Bennett ( 1995|) used a decision tree 
to determine appropriate antecedents for zero 
pronouns. They focused on proper and definite 
nouns used in anaphoric expressions as well as 
zero pronouns. However, their method resolves 
only anaphors that refer to organization names 
(e.g., private companies), which are generally 
easier to resolve than our case. 

Both above existing methods require anno- 
tated corpora for statistical modeling, while we 
used corpora with/without annotations related 
to anaphoric relations, and thus we can eas- 
ily obtain large-scale corpora to avoid the data 
sparseness problem. 



Nakaiwa ( 2000 ) used Japanese/English bilin- 
gual corpora to identify anaphoric relations of 
Japanese zero pronouns by comparing J/E sen- 
tence pairs. The rationale behind this method 
is that obligatory cases zero-pronominalized 
in Japanese are usually expressed in English. 
However, in the case where corresponding En- 
glish expressions are pronouns and anaphors, 



their method is not effective. Additionally, 
bilingual corpora are more expensive to obtain 
than monolingual corpora used in our method. 

Finally, our method integrates a parameter 
for zero pronoun detection in computing the cer- 
tainty score. Thus, we can improve the accuracy 
of our system by discarding extraneous outputs 
with a small certainty score. 

5 Conclusion 

We proposed a probabilistic model to ana- 
lyze Japanese zero pronouns that refer to an- 
tecedents in the previous context. Our model 
consists of two probabilistic parameters corre- 
sponding to detecting zero pronouns and iden- 
tifying their antecedents, respectively. The lat- 
ter is decomposed into syntactic and semantic 
properties. To estimate those parameters ef- 
ficiently, we used annotated/unannotated cor- 
pora. In addition, we formalized the certainty 
score to improve the accuracy. Through exper- 
iments, we showed that the use of unannotated 
corpora was effective to avoid the data sparse- 
ness problem and that the certainty score fur- 
ther improved the accuracy. 

Future work would include word sense disam- 
biguation for polysemous predicate verbs to se- 
lect appropriate case frames in the zero pronoun 
detection process. 
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